为了解决不平衡分类任务中生成图像的质量多样性的权衡问题,我们研究了功能级别的基于过度采样的方法,而不是数据级别,并专注于搜索潜在功能空间以进行最佳分布。在此基础上,我们提出了改进的基于潜在特征分布演化(MEDA_LUDE)算法的改进的估计分布算法,其中对联合学习程序进行了编程,以使深神经网络和进化算法分别优化和进化。我们探讨了大利润度高斯混合物(L-GM)损失功能对分配学习和设计基于样品之间相似性以增加多样性的专业健身函数的影响。基于基准的不平衡数据集的广泛实验验证了我们提出的算法的有效性,该算法可以生成具有质量和多样性的图像。此外,MEDA_LUDE算法还应用于工业领域,并成功地减轻了织物缺陷分类中的不平衡问题。
translated by 谷歌翻译
注意机制已成为场景文本识别方法(STR)方法中的事实上的模块,因为它有能力提取字符级表示。可以将这些方法汇总到基于隐性注意力的基于隐性的注意力和受监督的注意力中,取决于如何计算注意力,即分别从序列级别的文本注释和字符级别的边界框注释中学到隐性注意和监督注意力。隐含的注意力可能会提取出粗略甚至不正确的空间区域作为性格的注意,这很容易受到对齐拖延问题的困扰。受到监督的注意力可以减轻上述问题,但它是特定于类别的问题,它需要额外费力的角色级边界框注释,并且当角色类别的数量较大时,将是记忆密集的。为了解决上述问题,我们提出了一种新型的关注机制,用于STR,自我保护的隐式字形注意力(SIGA)。 Siga通过共同自我监督的文本分割和隐性注意对准来描述文本图像的字形结构,这些文本分割和隐性注意对准可以作为监督,以提高注意力正确性,而无需额外的角色级注释。实验结果表明,就注意力正确性和最终识别性能而言,SIGA的性能始终如一地比以前的基于注意力的STR方法更好,并且在公开可用的上下文基准上以及我们的无上下文基准。
translated by 谷歌翻译
我们建议以人为本的4D场景捕获(HSC4D)准确有效地创建一个动态的数字世界,其中包含大规模的室内场景,各种各样的人类动作以及人类与环境之间的丰富互动。 HSC4D仅使用车身安装的IMU和LIDAR,没有任何外部设备的限制和无图形地图,没有预构建的地图。考虑到IMU可以捕获人的姿势,但始终为长期使用而漂移,而LiDar对于全球本地化却是稳定的,但对于本地位置和方向而言,HSC4D使两个传感器通过联合优化和实现长期的有希望的结果相互补充。捕获。还探索了人与环境之间的关系,以使其相互作用更加现实。为了促进许多下游任务,例如AR,VR,机器人,自动驾驶等,我们提出了一个数据集,其中包含三个大型场景(1k-5k $ m^2 $),并具有准确的动态人类动作和位置。各种场景(攀岩馆,多层建筑,坡度等)以及挑战人类活动(锻炼,上下楼梯,攀岩等)展示了HSC4D的有效性和概括能力。数据集和代码可在http://www.lidarhumanmotion.net/hsc4d/上获得。
translated by 谷歌翻译
我们提出了一种新的基于网格的学习方法(N-Cloth),适用于合理的3D布变形预测。我们的方法是通用的,可以处理具有任意拓扑的三角网格表示的布料或障碍物。我们使用Graph卷积将布料和对象网格转换为潜在空间以减少网格空间中的非线性。我们的网络可以基于初始布网格模板和目标障碍物网的状态来预测目标3D布网格变形。我们的方法可以处理复杂的布料网格,最高可达100美元的k三角形和场景,具有与SMPL人,非SMPL人或刚体相对应的各种对象。在实践中,我们的方法展示了连续输入框架之间的良好时间相干性,并且可用于在NVIDIA GeForce RTX 3090 GPU上以30-45美元的$ 30-45 $ FPS产生合理的布料模拟。我们突出了以前基于学习的方法和基于物理的布料模拟器的好处。
translated by 谷歌翻译
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
translated by 谷歌翻译
As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.
translated by 谷歌翻译
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.
translated by 谷歌翻译
Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.
translated by 谷歌翻译
Nowadays, time-stamped web documents related to a general news query floods spread throughout the Internet, and timeline summarization targets concisely summarizing the evolution trajectory of events along the timeline. Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events and summarize important events in chronological order. To tackle this challenge, in this paper, we propose a Unified Timeline Summarizer (UTS) that can generate abstractive and extractive timeline summaries in time order. Concretely, in the encoder part, we propose a graph-based event encoder that relates multiple events according to their content dependency and learns a global representation of each event. In the decoder part, to ensure the chronological order of the abstractive summary, we propose to extract the feature of event-level attention in its generation process with sequential information remained and use it to simulate the evolutionary attention of the ground truth summary. The event-level attention can also be used to assist in extracting summary, where the extracted summary also comes in time sequence. We augment the previous Chinese large-scale timeline summarization dataset and collect a new English timeline dataset. Extensive experiments conducted on these datasets and on the out-of-domain Timeline 17 dataset show that UTS achieves state-of-the-art performance in terms of both automatic and human evaluations.
translated by 谷歌翻译
For Prognostics and Health Management (PHM) of Lithium-ion (Li-ion) batteries, many models have been established to characterize their degradation process. The existing empirical or physical models can reveal important information regarding the degradation dynamics. However, there is no general and flexible methods to fuse the information represented by those models. Physics-Informed Neural Network (PINN) is an efficient tool to fuse empirical or physical dynamic models with data-driven models. To take full advantage of various information sources, we propose a model fusion scheme based on PINN. It is implemented by developing a semi-empirical semi-physical Partial Differential Equation (PDE) to model the degradation dynamics of Li-ion-batteries. When there is little prior knowledge about the dynamics, we leverage the data-driven Deep Hidden Physics Model (DeepHPM) to discover the underlying governing dynamic models. The uncovered dynamics information is then fused with that mined by the surrogate neural network in the PINN framework. Moreover, an uncertainty-based adaptive weighting method is employed to balance the multiple learning tasks when training the PINN. The proposed methods are verified on a public dataset of Li-ion Phosphate (LFP)/graphite batteries.
translated by 谷歌翻译